Mining Frequent Itemsets Using Re-Usable Data Structure
نویسندگان
چکیده
Several algorithms have been introduced for mining frequent itemsets. The recent datasettransformation approach suffers either from the possible increasing in the number of structures that could be produced through the execution of the algorithm or from the problem of the processing time in either projecting or decomposing the datasets. Moreover, the constructed structure cannot be re-used in ad-hoc mining queries or in other mining processes. In this paper, the ItemSet Tree (IST) structure is used in effectively counting the itemsets' support to overcome the above limitations. To speedup the support counting process, a proposal for using a Guidance Information Bits and tree size reduction is presented. The TDF algorithm will be proposed to find all the frequent itemsets. TDF explores the frequent itemsets search space in depth-first to generate candidates from the search space and count their support in the IST. Several experiments have been conducted to study the performance of the TDF algorithm.
منابع مشابه
A New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملMining maximal frequent itemsets from data streams
Frequent pattern mining from data streams is an active research topic in data mining. Existing research efforts often rely on a two-phase framework to discover frequent patterns: (1) using internal data structures to store meta-patterns obtained by scanning the stream data; and (2) re-mining the meta-patterns to finalize and output frequent patterns. The defectiveness of such a two-phase framew...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملروشی کارا برای کاوش مجموعه اقلام پرتکرار در تحلیل دادههای سبد خرید
Discovery of hidden and valuable knowledge from large data warehouses is an important research area and has attracted the attention of many researchers in recent years. Most of Association Rule Mining (ARM) algorithms start by searching for frequent itemsets by scanning the whole database repeatedly and enumerating the occurrences of each candidate itemset. In data mining problems, the size of ...
متن کاملMining Frequent Closed Itemsets with the Frequent Pattern List
The mining of the complete set of frequent itemsets will lead to a huge number of itemsets. Fortunately, this problem can be reduced to the mining of frequent closed itemsets (FCIs), which results in a much smaller number of itemsets. The approaches to mining frequent closed itemsets can be categorized into two groups: those with candidate generation and those without. In this paper, we propose...
متن کامل